Skip to content

move all hardware metrics to their own yaml files #2380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

trisch-me
Copy link
Contributor

@trisch-me trisch-me commented Jun 16, 2025

Fix #1309

Continue to move hardware metrics to their own files.

@bertysentry I would need you guidance as it wasn’t possible to move metrics out as is due to restrictions:

We can’t have both x.y and x.y.z names as declaring x.y.z makes x.y automatically a namespace. so you can’t use it as leaf node. In our case we have originally hw.battery.charge and hw.battery.charge.limit. I have changed the latter to the hw.battery.charge_limit but I’m not sure if this is a correct way and if it breaks anything?

Another moment is about hw.status metric from https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/#hwbattery---battery-metrics. Do we need to define it separately as I did in this request? It relies on hw.type to be the only type (in this case battery) and state to be the set of specific states for each component and it doesn’t play well together with previous definitions and current restrictions on enums in semconv. We are working on them but before making a final contribution I just want to clarify these points as every other component has the similar set of questions i.e. hw.cpu.speed and hw.cpu.speed.limit and hw.status with another set of values.

Some of the components have only hw.status. And what this actually mean - for example we do define it as hw.enclosure.* - Enclosure metrics but later say just hw.status. Should it be hw.enclosure.status?

cc @jsuereth for the hints on metric solution in this case

@github-actions github-actions bot added enhancement New feature or request area:hardware labels Jun 17, 2025
@github-project-automation github-project-automation bot moved this from Untriaged to Blocked in Semantic Conventions Triage Jun 20, 2025
@bertysentry
Copy link
Contributor

@trisch-me I am surprised by this new requirement that you mentioned:

We can’t have both x.y and x.y.z names as declaring x.y.z makes x.y automatically a namespace. so you can’t use it as leaf node. In our case we have originally hw.battery.charge and hw.battery.charge.limit. I have changed the latter to the hw.battery.charge_limit but I’m not sure if this is a correct way and if it breaks anything?

My opinion is that it's a shame that this limitation is caused by our tooling only. OTLP itself totally supports having hw.battery.charge and hw.battery.charge.limit.

The .limit suffix has also being used to specify thresholds, like hw.temperature.limit. Because of this limitation (namespace vs leaf), we will probably have to rename our temperature metric to hw.temperature.temperature, which is... suboptimal.

Can we add a special status for *.limit metrics, to make sure they don't create a namespace? It shouldn't be too complicated.

If not, my suggestion for battery charge and limit is to go simply with:

  • hw.battery.charge (indicates the level of charge of the battery)
  • hw.battery.limit (indicates the charge limits at which the battery can operate properly)

Thank you!

@bertysentry
Copy link
Contributor

bertysentry commented Jun 20, 2025

@trisch-me WRT "Some of the components have only hw.status. And what this actually mean - for example we do define it as hw.enclosure.* - Enclosure metrics but later say just hw.status. Should it be hw.enclosure.status", that's an excellent question!

When writing the original semconv for hardware, I tried to group metrics per component type (CPU, memory, etc.). Each component had its own status metric, like hw.enclosure.status and hw.physical_disk.status, etc. But I was asked to refactor that as simply hw.status with hw.type="enclosure|physical_disk|...", which I think was a great idea! I now strongly recommend that we keep hw.status as a generally applicable metric to any kind of hardware component. It works really great to assess hardware problems, with really simply queries.

Edit: "I WAS asked", and not "I asked" and then thought it's such a great idea 😅

@braydonk
Copy link
Contributor

braydonk commented Jun 20, 2025

Can we add a special status for *.limit metrics, to make sure they don't create a namespace? It shouldn't be too complicated.

I'm in favour of this or revisiting this whole notion of metric names and namespaces etc.; the original issue for it (#50) has been open for a while and the reasoning in favour of it doesn't seem super strong to me relative to the naming gymnastics it requires.

@trisch-me
Copy link
Contributor Author

I have added a question about metric names to the todays semconv meeting

@brunobat
Copy link

I wonder if we should follow the naming conventions already used by the car battery industry.
They are pretty precise and are already widely used... This could also open up OpenTelemetry to that field. See: https://en.wikipedia.org/wiki/Battery_management_system#Computation

Examples: State of charge (SoC), State of health (SoH), charge current limit(CCL)...

@bertysentry
Copy link
Contributor

@thompson-tomo So there is no way we could consider .limit as a "special" metric that doesn't imply a namespace?

I am deeply skeptical of the decision that was made by the TC on this. To allow an hypothetical hierarchical representation of metrics in a JSON payload, we're now trying to invent many workarounds for this artificial limitation, with metrics like hw.voltage.actual, or hw.voltage.current (which is very very confusing).

The TC probably didn't realize the weird consequences of the decision of preventing a metric from being the prefix of another metric. A xxx.yyy metric will prevent any xxx.yyy.zzz metric further down the road, resulting in weird decisions like "let's rename xxx.yyy to xxx.yyy.current or xxx.yyy.value so that we can have xxx.yyy.zzz").

It will actually create a lot of instability in our semantic conventions!

Is there a way to "appeal" this decision? 😅

@thompson-tomo
Copy link
Contributor

When I look at hw.voltage that sounds alot like an entity which lacks a descriptor. Does it represent the expected voltage, actual voltage, minimum voltage etc.

@trisch-me
Copy link
Contributor Author

@bertysentry let’s discuss it today during the meeting once more, I have moved everything to the yaml files and we have no problems right now with tooling (but because we don’t check namespace re-usage for the metrics). We can raise it once more to find the appropriate solution

@trisch-me
Copy link
Contributor Author

trisch-me commented Jul 7, 2025

@bertysentry we have agreed that we merge it as is to make it pure refactoring and extracting the data from md files to the yaml files and discuss that we should come up with a solution later on how we proceed with collision
so I need you factual check if all metrics are ok and I haven’t missed anything

Copy link
Contributor

@bertysentry bertysentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving as all my suggestions can be implemented in a future PR, if accepted.

Thank you @trisch-me for all this work! It's really helpful! 😊

@trisch-me trisch-me moved this from Blocked to Needs More Approval in Semantic Conventions Triage Jul 9, 2025
@trisch-me
Copy link
Contributor Author

During todays’ semconv meeting we have discussed that we will proceed as is with current changes (not changing anything substantial) and there was a request to create a hardware SIG to discuss any other changes to this domain.

@bertysentry @thompson-tomo and everyone else interested

@trisch-me trisch-me requested a review from thompson-tomo July 18, 2025 11:37
@trisch-me
Copy link
Contributor Author

I have also updated github link checker to exclude removed files from check

@trisch-me trisch-me dismissed thompson-tomo’s stale review July 21, 2025 08:36

we made decision not to change names

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I still have a lot of questions about this area of semconv, I think bringing it into YAML where we can see how it all hangs together with existing semconv is the right approach.

Let's not let perfect be the enemy of good - let's bring this into YAML so it benefits from automation and consistency that is brought through automation.

AFAICT - This is a faithful re-representation of the HW markdown into YAML, and it has @bertysentry's approval.

@trask trask enabled auto-merge August 4, 2025 15:07
@trisch-me
Copy link
Contributor Author

@joaopgrassi @trask @AlexanderWert @lmolkova could you please check this PR, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:hardware enhancement New feature or request
Projects
Status: Needs More Approval
Development

Successfully merging this pull request may close these issues.

Move hardware metrics to the registry
6 participants